Fast parallel conversion of edge list to adjacency list for large-scale graphs
نویسندگان
چکیده
In the era of bigdata, we are deluged with massive graph data emerged from numerous social and scientific applications. In most cases, graph data are generated as lists of edges (edge list), where an edge denotes a link between a pair of entities. However, most of the graph algorithms work efficiently when information of the adjacent nodes (adjacency list) for each node are readily available. Although the conversion from edge list to adjacency list can be trivially done on the fly for small graphs, such conversion becomes challenging for the emerging large-scale graphs consisting billions of nodes and edges. These graphs do not fit into the main memory of a single computing machine and thus require distributed-memory parallel or external-memory algorithms. In this paper, we present efficient MPI-based distributed memory parallel algorithms for converting edge lists to adjacency lists. To the best of our knowledge, this is the first work on this problem. To address the critical load balancing issue, we present a parallel load balancing scheme which improves both time and space efficiency significantly. Our fast parallel algorithm works on massive graphs, achieves very good speedups, and scales to large number of processors. The algorithm can convert an edge list of a graph with 20 billion edges to the adjacency list in less than 2 minutes using 1024 processors. Denoting the number of nodes, edges and processors by n, m, and P, respectively, the time complexity of our algorithm is O(P + n+P) which provides a speedup factor of at least Ω(min{P,davg}), where davg is the average degree of the nodes. The algorithm has a space complexity of O(P ), which is optimal.
منابع مشابه
Data structure for representing a graph: combination of linked list and hash table
In this article we discuss a data structure, which combines advantages of two different ways for representing graphs: adjacency matrix and collection of adjacency lists. This data structure can fast add and search edges (advantages of adjacency matrix), use linear amount of memory, let to obtain adjacency list for certain vertex (advantages of collection of adjacency lists). Basic knowledge of ...
متن کاملEdge-Coloring Bipartite Multigraphs to Select Network Paths
We consider the idea of using a centralized controller to schedule network traffic within a datacenter and implement an algorithm that edge-colors bipartite multigraphs to select the paths that packets should take through the network. We implement three different data structures to represent the bipartite graphs: a matrix data structure, an adjacency list data structure, and an adjacency list d...
متن کاملPipelined Workflow in Hybrid MPI/Pthread runtime for External Memory Graph Construction
Graph construction from a given set of edges is a data-intensive operator that appears in social network analysis, ontology enabled databases, and, other analytics processing. The operator represents an edge list to what is called compressed sparse row (CSR) representation (or sometimes in adjacency list, or as clustered B-Tree storage). In this work, we show how to scale CSR construction to ma...
متن کاملAn Introduction to Graph Compression Techniques for In-memory Graph Computation
In this work we attempt to answer the following question: How large a graph can we process using a vertex-centric model of computation in the main memory of a single machine? Specifically, we use a modified Pregel framework to calculate PageRank, identify connected components, and single source shortest path algorithms on two large graphs. While it is not possible to load these graphs into memo...
متن کاملMerging Adjacency Lists for Efficient Web Graph Compression
Analysing Web graphs meets a difficulty in the necessity of storing a major part of huge graphs in the external memory, which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly but also providing random access. Our algorithm belongs to this category. It works on contiguo...
متن کامل